Decision Tree Regression

In [1]:

from IPython.display import Image

Classification And Regression Trees(CART) is a term introduced by Leo Breiman to refer Decision Tree algorithm which is used to predict using classification or regression model.

In [2]:

Image('img/01.png')

Out[2]:

In [3]:

Image('img/02.png')

Out[3]:

In [4]:

Image('img/03.png')

Out[4]:

The algorithm split the data into several terminal leaves which denotes the average. Above we have two independent variables and one dependent variable. Depending on the value of two new independent variable we can predict the value of dependent variable with a more precise manner rather then the naive approach(where no matter what are the two new dependent variables are we will assign the value of average of all the points to the dependent variable corresponds to the two independent variable).

For exampe let's say we want to predict the dependent variable for two independent variable, X1 = 30, X2 = 100 (say).

The from the decision tree we can say that Y = -64.1 (as X1 < 20 => No, X2 < 170 => Yes and X1 < 40 = Yes)

Data Preprocessing

In [5]:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
%matplotlib inline
plt.rcParams['figure.figsize'] = [14, 8]

# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

Fitting the Decision Tree Regression Model to the dataset

In [6]:

regressor = DecisionTreeRegressor(random_state = 42)
regressor.fit(X, y)

Out[6]:

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=42, splitter='best')

Predicting a new result

In [7]:

y_pred = regressor.predict(6.5)
y_pred

Out[7]:

array([ 150000.])

Visualising the Decision Tree Regression results

In [8]:

plt.scatter(X, y, color = 'red')
plt.plot(X, regressor.predict(X), color = 'blue')
plt.title('Truth or Bluff (Regression Model)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

Out[8]:

Visualising the Decision Tree Regression results (for higher resolution and smoother curve)

In [9]:

X_grid = np.arange(min(X), max(X), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.title('Truth or Bluff (Regression Model)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

Out[9]:

From the above graph it is obvious that we are getting an average value for each interval. Also the value of Salary for level from 5.5 and 6.5 is 150000.

Decision Tree Regression

Data Preprocessing

Fitting the Decision Tree Regression Model to the dataset

Predicting a new result

Visualising the Decision Tree Regression results

Visualising the Decision Tree Regression results (for higher resolution and smoother curve)

Product

Resources

Company